Effects of Multithreading on Data and Workload Distribution for Distributed-Memory Multiprocessors
نویسندگان
چکیده
While data and workload distribution can be tailored to fit a particular problem to a particular distributed-memory architecture, it is often difficult to do so for various practical issues. This report presents our study on multithreading for distributedmemory multiprocessors. Specifically, we investigate the effects of multithreading on data distribution and workload distribution with variable thread granularity. Various types of workload distribution strategies are defined along thread granularity. Three types of data distribution strategies are investigated, including row-wise cyclic, k-way partial-row cyclic, and blocked distribution. We have implemented all of these on the 80-processor EM-4 distributed-memory multiprocessor using highly sequential Gaussian Elimination with Partial Pivoting and highly parallel Matrix Multiplication. Experimental results indicated that multithreading can offset the loss that is due to the mismatch of data distribution to workload distribution for even sequential and irregular problems while giving high absolute performance.
منابع مشابه
Data and Workload Distribution in a Multithreaded Architecture
Matching data distribution to workload distribution is important to improve the performance of distributedmemory multiprocessors. While data and workload distribution can be tailored to fit a particular problem to a particular distributed-memory architecture, it is often difficult to do so for various reasons including complexity of address computation, runtime data movement, and irregular reso...
متن کاملDynamic Characteristics of Multithreaded Execution in the EM - X Multiprocessor
Multithreading is known be e ective for tolerating communication latency in distributed-memory multiprocessors. Two types of support for multithreading have been used to date including software and hardware. This paper presents the impact of multithreading on performance through empirical studies. In particular, we explicate the performance di erence between software support and hardware suppor...
متن کاملSystem Software Support for Reducing Memory Latency on Distributed Shared Memory Multiprocessors
This paper overviews results from our recent work on building customized system software support for Distributed Shared Memory Multiprocessors. The mechanisms and policies outlined in this paper are connected with a single conceptual thread: they all attempt to reduce the memory latency of parallel programs by optimizing critical system services, while hiding the complex architectural details o...
متن کاملLatency Tolerance through Multithreading in Large-Scale Multiprocessors
In large-scale distributed-memory multiprocessors, remote memory accesses su er signi cant latencies. Caches help alleviate the memory latency problem by maintaining local copies of frequently used data. However, they cannot eliminate the latency caused by rst-time references and invalidations needed to enforce cache coherence. Multithreaded processors tolerate such latencies by rapidly switchi...
متن کاملScheduling to Reduce Memory Coherence Overhead on Coarse-grain Multiprocessors 1 Scheduling to Reduce Memory Coherence Overhead on Coarse-grain Multiprocessors
Some Distributed Shared Memory (DSM) and Cache-Only Memory Architecture (COMA) multiprocessors keep processes near the data they reference by transparently replicating remote data in the processes' local memories. This automatic replication of data can impose substantial memory system overhead on an application since all replicated data must be kept coherent. We examine the eeect of task schedu...
متن کامل